Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition

نویسندگان

Katariina Mahkonen

Antti Hurmalainen

Tuomas Virtanen

Jort F. Gemmeke

چکیده

This paper proposes learning-based methods for mapping a sparse representation of noisy speech to state likelihoods in an automatic speech recognition system. We represent speech as a sparse linear combination of exemplars extracted from training data. The weights of exemplars are mapped to speech state likelihoods using Ordinary Least Squares (OLS) and Partial Least Squares (PLS) regression. Recognition experiments are conducted using the CHiME noisy speech database. According to the results, both algorithms can be successfully used for training the mapping. We achieve improvements over the previous binary labeling system, and recognition scores close to 70% at -6 dB SNR.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Artificial and online acquired noise dictionaries for noise robust ASR

Recent research has shown that speech can be sparsely represented using a dictionary of speech segments spanning multiple frames, exemplars, and that such a sparse representation can be recovered using Compressed Sensing techniques. In previous work we proposed a novel method for noise robust automatic speech recognition in which we modelled noisy speech as a sparse linear combination of speech...

متن کامل

State-based labelling for a sparse representation of speech and its application to robust speech recognition

This paper proposes a state-based labeling for acoustic patterns of speech and a method for using this labelling in noiserobust automatic speech recognition. Acoustic time-frequency segments of speech, exemplars, are obtained from a training database and associated with time-varying state labels using the transcriptions. In the recognition phase, noisy speech is modeled by a sparse linear combi...

متن کامل

Voice-based Age and Gender Recognition using Training Generative Sparse Model

Abstract: Gender recognition and age detection are important problems in telephone speech processing to investigate the identity of an individual using voice characteristics. In this paper a new gender and age recognition system is introduced based on generative incoherent models learned using sparse non-negative matrix factorization and atom correction post-processing method. Similar to genera...

متن کامل

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Quality of speech signal significantly reduces in the presence of environmental noise signals and leads to the imperfect performance of hearing aid devices, automatic speech recognition systems, and mobile phones. In this paper, the single channel speech enhancement of the corrupted signals by the additive noise signals is considered. A dictionary-based algorithm is proposed to train the speech...

متن کامل

Improving the performance of MFCC for Persian robust speech recognition

The Mel Frequency cepstral coefficients are the most widely used feature in speech recognition but they are very sensitive to noise. In this paper to achieve a satisfactorily performance in Automatic Speech Recognition (ASR) applications we introduce a noise robust new set of MFCC vector estimated through following steps. First, spectral mean normalization is a pre-processing which applies to t...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2011

Mapping Sparse Representation to State Likelihoods in Noise-Robust Automatic Speech Recognition

نویسندگان

چکیده

منابع مشابه

Artificial and online acquired noise dictionaries for noise robust ASR

State-based labelling for a sparse representation of speech and its application to robust speech recognition

Voice-based Age and Gender Recognition using Training Generative Sparse Model

A New Method for Speech Enhancement Based on Incoherent Model Learning in Wavelet Transform Domain

Improving the performance of MFCC for Persian robust speech recognition

عنوان ژورنال:

اشتراک گذاری